A Scientific Approach to Practical Induction
نویسندگان
چکیده
The purpose or practical induction is to create systems ror powerrul (efficient and effective) generalization learning. This paper argues that a ,cientific approach to practical induction promotes discovery of essential principles. Some have emerged from development of the author's learning systems, which have contributed promising methods and unique results. INTRODUCTION (Induction in Seie~ce and in Machine Learning) A scientist creates and tests intelligent hypotheses. Experiment may falsiry an hypothesis H; on the other hand repeated testing may support H-i.e. raise its credibility [26]. For example, H might be "Localization or credit improves machine learning." Because many implelIlenta tions seem to support this, we tend to believe it (although it might be interesting to detet-mine details [23]). The resources or science are limited, so we strive to direct efforts well, and we deye!op disciplines (methodologies) ror this end. Powerful methodologies are both efficient and effectit'e: they avoid poor hypotheses and promote discovery of credible ones. Analogue in machine learning. Hypothesis formation is induction, which AI tries to mechanize. I In theory, induction presents no problem: hypotheses can simply be generated and tested [2, 26]. In practice, however, the problem is so complex that etl'ective and efficient methods ror limiting search are imperative. Practical induction: power == effectiveness + efficiency. The study or practical induction in machine learning has two broad goals: construction or power/ul (etl'ective and efficient) representations and algorithms, and discovery or principles underlying this power. Aspects include scope or application, noise management, computational complexity, conver gence to optima1.control structures, etc. [2,6,9, 21}. Search ror principles. What. are the essential ingredients or a powerrul inductive sys tem! In confronting this question, some researchers have synthesized systems and created models, although this work is just beginning [2, 4, 6, 10, 20]. Despite the elusiveness or power ful induction, unified models have been aided by weU-eonceived systems. As is typical of sci ence and engineering, theory guides design and experiment, which in turn hones theory. 1. Mechanized induction inputs event, or object, and produces cla"eI or concept' ror prediction 01 ruture events. The importance or automated induction has been emphasized in, e.g. (101. . Thesis of this paper. In addition to experimentation, scientific methodology includes appropriate abstraction, inclination toward elegant theory, and determination of important relationships (w hie h often become quantitative). The next section of this paper presents an abstraction useful for automated generalization learning. The third section analyzes inductive power. Throughout, we shall argue for ,cicntific investigation of mechanized induction.ion useful for automated generalization learning. The third section analyzes inductive power. Throughout, we shall argue for ,cicntific investigation of mechanized induction. POWERFUL CLUSTERING (Ideas, Methods, and Clarifications) Task utility for inductive guidance. The utility is directly related to domain of appli cation [13, 16, 21]. Various measures are possible. Utility may be the value or an object in task perrormance, and it may be probabilistic [13-21]. The probability of task usefulness col lapses to set membership in deterministic cases (a probabilistic utility subsumes positive and negative examples of a concept) [21]. The system PLS uses probabilistic methods to induce probabilistic utility, as utility provides a bridge between domain and induction.' Utilityembo dies ideas or active, goal-directed perception [3,8] which can contribute to inductive power. Constraints. Inductive power is related to restrictions imposed on data specification, on rorms of classes or concepts, and on algorithmic processing 12, 21, 26]. For example, features (attributes or objects) selected by the user, are designed to compress data even befor~ any mechanized induction [19, 20]. Further, utility almost always bears a smooth relationship to user-selected reatures. This allows meaningrul dUltering or objects in local neighborhoods or reature space. See [21J for further discussion and more rererences. Cluster analysis. In our view, Samuel designed signature tables to compress similar utilities into reature space cells [25]. Much of this was not automated, whereas PLSI and PLS2 mechanize the clustering. Cluster analysis is an established statistical technique ror inductive inrerence which partitions similar objects into distinctive classes. Similarities and distinctions are rormalized by the use of some (di,),imilarity criterion. Normally the criterion depends 1. The following is a sketch of probabilistic le4rning '!lstem, PLS (see 16, 13-231 for details): Basic s"stem capablllt". The original PLSl is capable of efficient and elective generalizat.ion learning in domains Cor which features (attributes) C3.n be defined and utility (performance) can be me:t.sured. PLSI can handle noise, selecting reatures which are most discriminatin,g despite error. While it can be applied to single concept learning 16). the system ha.s been developed and tested in the difficult dom3.in of heuristic sea.rch, which requires not only noise m3.nagement, but also increment3.1 learning and removal or bias rrom data acquired during task performance. The power or PLSI has been demonstrated in com parisons with alternative methods 114. 231. The system caD discover optimal evaluation runctions, a uDique result 116. 20, 231. S"stem extension. PLSI! is a doubly layered learnin,g system which uses both PLSI and a ,genetic algo rithm 171. PLS2 Operations perrormed on utility clusters include ,generalization, specialization, and reor ganization. PLS2 is more stable, accurate, and efficient than its predecessor 118,231. A s"stem ror creation or new terms. A more ambitious project involves the sophisticated system PLSO, designed ror substantial con81ruclitle induction 120,221. PLSO uses knowledge layering and invari anee of utility surfaces to create concepts from progressively validated components. This system appears suitable for problems which were previously intract3ble 1221. only on (eatures, but this simplification can cause problems [2]. New kinds 01 clustering (Utility, Conceptual, aDd Higher-dimensional). Criteria based on something other than Ceatures are e:ternal criteria [I, p. 194]. Several years ago the author introduced utility 6imilaritll as a suitable external criterion w hen the induction relates to performance of some task [13,14, 16,21]. Utility similarity involves the whole dtJttJ environ ment; not just features. Utility provides a firm basis for conceptual cohe6;Vene,f, [10]. Clusters may be constrained, e.g. PLS uses Ceature space rectangles -conjunctions of attribute ranges. Compressing data into preconceived forms is conceptual cludering [10]. PLSO, the author's system for substantial constructive induction, originate~ a kind of clus. tering which groups not just attributes, or even simple utilities, but rat.her utility ,ur/ace, in subspaces of very primitive features. These surfaces represent interrelationships among com ponents of objects. The process of clustering utility surfaces creatu Ilructure [22]. Disguised conceptual clustering. Superficially, Quinlan's ID3 [141 isdifJerent from ~tic-halski's systems [10], or from the author's PLSI. But ID3 is a veiled form of utility Cluster ing. ID3 selects attributes having the greatest, ability to discriminate. So does PLSI. The util· ity diuimiiarity of PLSI is essentially the in/ormation of ID3. Once ID3 chooses an attri~ute, it constructs one branch of the discrimination tree Cor each attribute value. In contrast, the clus tering algorithm of PLSI splits sets of attribute values only when discrimination is thereby improved. This suggests an obvious modification of ID3, and argues for continued syntheses like [2, 6. 19, 201. Utility is the sole basis for clustering in PLSI and "clustering" in ID3. WHAT PRODUCES POWER! (Principles) This section suggests a few incipient principles which may underlie inductive power. All para graphs but the last refer to mechan;:ed induction. Mediating structures. Discussed further in [20, 22], this is a proposed addition to Buchanan's model (41. Successful systems tend to incorporate knowledge structures which mediate objects and concepts during inductive processing. These structures are varied. One codes growing assurance ot provisional hypotheses (through probabilistic information in PLSl). Another mediating structure houses components of tentative concepts (in PLSO). PLSO employs divide and conquer techniques to build knowledge in chunks of increasing complexity [20, 22]. Hypotheses, gradually and tentatively constructed on lower levels, become confirmed elements of higher level concepts. Consequently the time complexity is improved [22]. Representation of whole sets of hypotheses using boundaries. Mitchell's deter ministic candidate elimination tor version 3paces [11] is efficient because limited boundaries represent whole sets of hypotheses (the boundaries gradually converge). The author's PLSI is efficient. (yet cautious) beca.use tentative boundaries represent. the restricted set of partially confirmed hypotheses (boundaries provisionally converge, with increasing as.surance). Multiple use or single events in credit localization. In traditional methods of optim. ization (e.g. hill climbing, response surrace fitting), solving a problem contributes only a single datum. In contrast, probabilistic learning systems like Samuel's checker player and PLSI make use of every single event (e.g. each state in heuristic search). No one event can errantly O'ierW helm the system, but still, each one updates knowledge about every feature or reature space cell. A similar situation arises in PLSO, only it is much more pronounced. Here a single object provides information about a myTiad or object components. (PLSO rocuses on the important ones.) This is reminiscent or ,demata in genetic algorithms: a single structure codes and supports many combinations and generalizations or its components [7]. Mutual data support. As in the previous paragraph, this involves multiple use or scarce information for the inductive process. Mutual data ,upport is a term coined by the author to express a subtle combination of phenomena. In many generalization algorithms (e.g. cur"e fitting, clustering), the agglomeration or similar events ,imuitaneQu$iy promotes data compression, noise management, accuracy improvement, and concept formation. Mutual data support appears in various forms in aU PLS systems. See {15-23], particularly [20, 22]. Proper s)'stem assessment. (How much knowledge is acquired?) This point ~efers not to mechanized induction, but to our inference about the power of s)'stems. Precise assess ment is important, not simply to know which methods are better, but also to help discover why they work well, in order to improve models, theories and designs. We need standards for answering questions such as: How difficult is the inductive task being studied! How much knowledge is acquired autonomously, versus the amount given by the user [21, 24]! To scientifically assess substantial learning in systems like PLSO, we need to quantify inductive difficulty of environments and inductive power of systems [19, 20, 21, 22\. This suggests analysis of computational complexity, and measurement of cost effectiveness. CONCLUSIONS (Suitable scholarship) In addition to specific methods, results, and contentions in or about mechanized practical induction (generalization learning), we have given a number of suggestions for scientific research in the field: Discovering equivalences in knowledge representations and algorithms is important for clear progress. So is quantification of the power of systems. Our machine learn ing investigations can also benefit from theoretical issues and results (2). One example is the highly developed work on credibility criteria by Watanabe (26, pp.154 a.J.
منابع مشابه
A suggested Motivational Method for Teaching Scientific Terminology, With a Practical Example
Using a reductionist approach, the motivational method for teaching scientific terminology aims at breaking down terms and their definitions into separate components, i.e. morphemes and their semantic features, rather than establishing a connection between terms and their definitions as holistic units. In other words, the ultimate goal of this method is achieving “semantic motivation,” (as oppo...
متن کاملTheoretical, Scientific and Practical Aspects of the Basic Stages of Cad Cam Designing of Centrifugal Pumps
Many theoretical and practical problems arise at different stages of the design/manufacturing process during development of a pump. The experience obtained in pump development proves that the quality of a pump depends on many factors, which should he satisfied to receive good results. At the same time, the theory of engineering design principles exists, which is applicable to designing various ...
متن کاملAn Approach for Accident Forecasting Using Fuzzy Logic Rules: A Case Mining of Lift Truck Accident Forecasting in One of the Iranian Car Manufacturers
Fuzzy Logic is one of the concepts that has created different scientific attitudes by entering into various professional fields nowadays and in some cases has made remarkable effects on the results of the practical researches. However, the existence of stochastic and uncertain situations in risk and accident field, affects the possibility of the forecasting and preventing the occurrence of the ...
متن کاملInduction of euthanasia using carbon dioxide in rat: an overview of the available practical guidelines
Background and aims: Euthanasia is used to define ending an animal's life in a way that results in rapid anesthesia and death without pain or distress. One of the most common methods of performing euthanasia in rats is the administration of carbon dioxide (CO2). The aim of this study was to review the available practical guidelines for inducing euthanasia in rats by administrating CO2. M...
متن کاملScientific and Ethical Perfection from Khawje Nasir's Viewpoint
This study intends to analyze scientific and ethical perfection from the viewpoint of Tusi. In analyzing perfectionism, we begin with analysis of the concept of perfection. Dimensions of human perfection (scientific and ethical ones), stating the degrees of human perfection and practical procedure and process of human perfection from Tusi’s viewpoint are other issues discussed in this paper. Di...
متن کاملPathology of Medical Responses to Attention Deficit Disorder - Overactivity in Criminal Policy Approach
Attention Deficit Hyperactivity Disorder (ADHD) is a neurobehavioral disorder in childhood. It is characterized by three levels of inattention, hyperactivity, impulsivity, and compounding. The most dangerous situation is when so-called "comorbiditychr('39') is associated with one or more other mental disorders. If this involuntary disorder (Lack of control over activity and impulses) is not tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1985